Median

Learn how to use the median to improve the accuracy of anomaly detection.

In statistics, a mean is not considered robust because extreme values influence it. Given our use case, the measure we use to identify extreme values is affected by those values we are trying to identify.

For example, at the beginning of the article, we used this series of values:

%0 node_1 2 node_2 3 node_3 5 node_1620293966695 2 node_1620293968720 3 node_1620294032843 12 node_1620293971192 5 node_1620294021604 3 node_1620294000634 4
Array of values

The mean of this series is 4.33, and we detected 12 as an anomaly.

If the 12 were a 120, the mean of the series would have been 16.33. Hence, our “reasonable” value is heavily affected by the values it is supposed to identify.

The median is considered a more robust measure. The median of a series is the value that half the series is greater than, and half the series is less than:

To calculate the median in PostgreSQL we use the function percentile_disc. In the series above, the median is 3. If we sort the list and cut it in the middle, the median will become clearer:

2, 2, 3, 3, 3

4, 5, 5, 12

If we change the value of 12 to 120, the median will not be affected at all:

2, 2, 3, 3, 3

4, 5, 5, 120

This is why a median is considered more robust than mean.

Weighted Mean
Other Methods
Mark as Completed
Report an Issue